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Umo dos monelros de olhor o of\c\o de produzir 
Informogoes sodols, economlcos e territoriais e corc^o arte de 
descrever o munda. Eetatieticas e mapae transpartam as fendmenas 
do realidade para escalas aprapriadas d perspectiva de nassa visda 
humana e nas permitem pensar e agir d distdnaia. canstruinda 
avenidas de mdo dupla que juntam o munda e suas imagens. Maiar a 
pader de sintese dessas representagdes, combinanda. cam precisaa, 
elementas disperses e heterageneas da catidiana, maiar a nassa 
canhecimenta e a nassa capacidade de campreender e transfarmar a 
realidade. 

Vista coma arte, o aficio de praduzir essas infarmagdes 
reflete a culture de um Pais e de sue epeca, came essa culture ve o 
munda e a terna visivel, redefininda a que ve e a que hd para se ver. 

Ne cendria de cantinua inavagde tecneldgica e mudanga 
de cultures da saciedade cantempordnea, as naves tecnalegias de 
infermagda - reuninde camputaderes, telecamunicagdes e redes de 
infermagda - aceleram aquele mavimente de mabilizagda do munda 
real. Aumenta a velacidade da acumulagda de infermagda e sda 
ampliadas seus requisites de atualizagda, formate - mais flexivel, 
persanalizada e interative - e, principalmente, de acessibilidade. A 
piataferma digital vem se cansalidande came a meia mais simples, 
borate e paderasa para tratar a informagda, ternande passiveis 
naves pradutes e serviges e canquistanda navos usudrias. 

Acreditamas ser a ambiente de converse e centraversia 
e de trace entre as diferentes dtsciplinas. nas mesas redondos e 
sessdes temdticas das Canferencias Nacienais de (3eagrafia, 
Cartagrafia e Estatistica e da Simpdsia de Inevagdes, aquele que 
melhar enseja o aprimeramento da cansense sabre as fendmenas a 
serem mensurades para retratar a saciedade, a ecanemia e o 
Territdne nacianal e sabre as priaridades e fermatas das infarmagdes 
necessaries para a fartalecimente do cidadania, a definigda de 
politicos publicas e a gestda palftice - administrative do Pais, e para 
crier uma saciedade mais Juste. 
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EDI: ELECTRONIC DATA INTERCHANGE 
FOR STATISTICAL DATACOLLECTION AND DISSEMINATION 

WJT. KeUer, Statistics Netherlands, The Netiierlands 



ABSTRACT 

In this paper, we will present some e:qperiences in the Netherlands with EDI for statistical 
datacollection and dissemination. We will consider the changes to be made for large scale 
EDI datacollection. We will argue that EDI demands a dramatic redesign of the way we 
collect and process statistical mformadcm, but the rewards in terms of response burden, 
quality and efBcienty^ mi ght be well wordi h. 

We will also discuss some projects at Statistics Netherlands dealing widi EDI for statistical 
disse minati on. We wiU cover Statline (our statistical database whh a traditional on-lme 
queiytool on a remote DOS client) and its new e 3 q>eiimental vrasion, with a so-called 
dynamic WEB on internet We will argue diat the hitemet not onfy provides great 
opportunities but also great challenges for statisticians. 

We will focus, beside on die technological aspects, on the concqitual and orgamsadonal 
implications EDI for Datacollection and dissemination. 

Keywords: Official Statistics, Datacollection, Dissemination, EDI, Intonet, Meta-information 
1. OFFICIAL STATISTICS 

National Statistical Institutes <NSFs) are contionted widi several strategic issues resulting 
horn new demands tiom our customers as well as new developments in hiformatian 
Technology (TI). Efficiency and mazket-orientaticm are die key-words now. We need to 
produce at lower costs. Furthermore we need to lower die co^ we inflict upon our stq]pliers 
of data. The outcome ^ould be a product diat, aithou^ not actually sold on a market, our 
clients eventually want Furthermore we are confronted with new developments in IT. They 
will give us the opportunities to constroct the necessary tools to meet die new demands, in a 
situation like this a NSI needs to make the ri^ strategic choices. 

The statistical production process is influenced by die growing d emand s of our clients 
(output) and of our respondents (hqmt). Concerning our output we see a demand for a better 
access, preferably electronically, and a greater user-fiiendliness. One partirailar aspect is a 
demand for an improvement of the coherence of die totality of the information we offer. 
W.r.t our input, there is a strong political demand for a decrease in the reqmndent burden as 
a part of alleviating the administrative burden of enterprises. NSTs like Statistics Nedierlands 
sends out a million questionnaires to enterprises and other irstitutions per anmmi Large and 
medium-sized enterprises may receive as maity as 50 questionnaires per year, including 
repetitive monthly and quarterly surveys. The conclusion is clear: NSFs have ^ the 




form-filling burden”. FurthezmoTe, budgets are shrinking so there is a demand for hi^o: 
efBciency and hi^er productivity. 



2. INFORIVEATION I’ECHNOLOGY (TT) 

We are also blessed wifii new IT developments; the technology pu^ These developments 
give us new technical possibilities, the means to construct new tools for our production 
process. We see large irsprovemCTts in the possibilities of data procesdng, data storage and 
data transmission. The last aspect will probably have die most striking infhience on our work: 
the electronic data interchange (EDI) between our respondents and the NSI on die one hand 
and the EDI between the NSI and its clients on die other. 

But these new IT developments also create their own demand outside the NSL The new 
technology will be used any^vhere. Our suppliers of data will use it Our clients will use it 
Th^ will no longer be satisfied to coznmnnicate widi us in the old vny, that is on paper. Our 
sippiiets produce their data by electronic means and will want to use those means to deliver 
those data directly to us in order to inizumise thdr own costs. Our clients process our-data by 
electronic means. will demand to be able to select and receive those data -with the tools 
that n has d> o£Ect. 

These &ctQis lead to the condusioa that die NSI will have to make those strategic dioicesin 
its prodncdon process diat make the best use of the posalnMes IT has to o£Ect. The potential 
of n will affect all aspects of our inodncdon process. To describe diem let us discern, within 
this prodoction process, three stages. The zc^nzt-phase is where the data are collected in 
contact widi the respondents. In the dzTOu^q>ut-pi 2 ase diese data are processed to produce the 
infinmadan widi ^ chatactetisdcs we are actually looking for. hi the output-phase dus 
infotmadon is offered to and disseminated amcmg our clients. In this psper, we will 
concentiate on the esdemal effects, so on die izput and ouput phase. 

Since the output phase defines our product and relates to our customers, we will consider it 
first Here the new developmeotspiobabfy get the most attendonfiom die public. Weseedie 
new Tneriia by vdudi infannadon can be presented to its users. Psper publications may 
ccHzdnne to play didr role but espedally the more professional user will want to select and 
receive his data by electronic means. NSTs areprodumng or developing diose means: data on 
CD-ROM, data on firtemet More importazit ami m^^ nicnre dzScuh is the way data should 
be presented with those new media. The amount of infotmadon will be mndh larger than we 
had in ourpsperpublicadons. At diat point the management of the meta^afinmadon becomes 
cruciaL 

For dns purpose NSFs are develtpiog stadsdcal databases, intended fizr end-usos, ^ving 
access to “alT our data. As could be expected, stnictuiii^ diose data is die Tnain problem. At 
die same time we are confionted with lackiz^ coherence due to lacking stadsdcal co- 
mdinadon. Ouput databases are intended to pl^ a k^-role in die disseminadon process of 
our data, hi the future, the strategic choice should been made diat we aim for such a structure 
wherein all publicadons and aU odier dissemination of data goes throng the database. 
Secdtm 3 and following will discuss die output strategies in more detaiL 




Besides die ou^ut side, espedally die data-coUecdon wiil be re-oidered. Besides die use of 
Computer Aided Interviewiiig (CAI) for housdiold surveys, also EDI •will pl^ a key role 
here, in particular for estab lishm ent surve}^. Because of die nature of EDI, no longer the 
demand for information (i.e. each statistical survey), but the suppfy of informaticai horn our 
re^ondents 'will dictate die organisation di^. And since most infoimation horn our 
respondents indirecdy comes &om electronic data>souices tike bookkeeping S 3 ^tems and 
re^sters, we should concentrate on these sources. If possible, eadi source should be tapped 
electronically once and conqiletely, using H>I, for any possible use within the NSL The 
collection is technically and conceptually ad^ted to that source, hi section 7 below we will 
elaborate on this view. 

hi this p^ier we focus on EDI at die output and hqiut ade of 'die statistical process. We start 
-with the ou^ut side. 



3.D1SSEM1NAIION 

At present, most statistical agencies provide a^r^ated statistical htibzmation in various 
•ways, but dcnuinandy in printed foim (on papa). Because printing is a relative cumbersome 
and eqiensive viay of dissemination, more and more people are looking at the electronic 
hi^iw^ (aica. die hnemet) as a che^ and ea^ w^to disseminate statistical information. 
This p£per focuses on the zEopact tins trend has on ofticial statistics. We will argue diat 
besides the technological dhnension of publishing on the hitauet, the main probloois will be 
concqjtnal, Le. those of statistical co-ontination andint^ration. 



hi this p^ier we will discuss some projects at Statistics Nedieriands (SN) dealing widi EDI 
for dissemination. We 'will cover Stadine (our statistical database with atzaditicn^ on-line 
query tool on a remote DOS clfont) and its new experimental veisian, Statiine-Wli uk, widi 
so-called dynamic Web pages on hitemet. We win aigi^ diat by combining die ease of use 
and ease of access of the hUemet with die mniti-dimensional database systems found in 
statistics, ^eat opportunities for statistical dissmnination win arrive. 

Presaxdy, our publications take many differmit shapes: printed ptqier, floppy disks, foxes and 
CD-ROM’s, automatic and human 'voice response, press release, 'videotex, etc. B ehind all 
diese different media there is (ag^sgated) statistical infonnation, o^n in maching readable 
form e.g. as die output of survey processing systems. Needed is a "one-stop" dissemination 
database shuated between the intemal processing systems and die outride 'wmdd, capable of 
producing many different media foom one source, in a consistent, timely and efficient way. 
Besides on-line access to tiie database, such a database system could also automatical^ 
provide the information for other media, sndi as floppies, email subscri^ons, foxes and CD 
ROM publications. But one of the most inqiortazit olgective of such a system is te provide 
easier on-line access by our customers to the 'weaidi of infonnation at statistical bureau’s, h is 
our opinion that in tiiis respect the hitemet will pl^ a very isqioztaDt role in the near future. 

the rise of the gr^hical browsers (Mosaic, Netsc^) on the hitemet, tiie net has grown 
immensely during the last year, '^^dnn months, nearfy every respectable company has set iqi 
its own so-called “Web-servef” on the World Wide Web (WWW). The net, widi its ricy- 
rocketing popularity and therefore great inhastructure, is aheacfy^ comiecting tens of millions 




people all over &e world, with access becoming easi^ and bandwidth nearly fiee (in Holland, 
the hitemet, at 28 Idips speed, will be a local phone call away for neariy everyone at the end 
of 1995). It allows statisticians not only to collect infonnadon mtne efficiently (see our p^>er 
on EDI), but also to disseminate a^egate statistics more efficiratly, with a marginal 
reproduction and distribution price close to zero. (There are already 27 000 hee internet 
subscribers to David Letterman Top Ten Listserven imagine such a drculatioo to our press 
releases !) 

At present, several statistical institutes publish information on die net duou^die WWW. 
WeUrknown Web-servers are those horn the US Ceisus Bureau, Statistics ranaHa^ Eurostat 
' and SN, to name a few. Everyone widi an internet connection and a btov^nser like Netec^ 
can visit these servers horn all over the world. Most of the materia l published on these 
Webb’s, however, is not really statistical inclination, but lists of publications, pass releases, 
and general inCnnation Crthe public. The limited amount of trofy statistical figures is ofien 
presented in a documentary w^, i.e. as electronic cqpiesofdie printed pages finm traditional 
publications. 

This appioa di, '^da. is ^ical for so-caUedstotic Web pages, makes it difficult to 
manipniate statistical figures as structured iafognation, since the user op^ has access to 
documents, ie. (finmatted)t^ What is really needed is access (ihiou^ the hitemet) to a 
real tCm&osa, enconq>assing vaiioos statistical sources in an int^rated systmL Once our 
statistical infbzmation is available in a structured, machine readable way, we can manipulate 
hand present it in aiyfbpn, including unstructured (like a text document) and str u ct u red 
(e.g. film a spreadsheet). This structured da t a b ase rpproacfa is also necessary in order to be 
able to provide better co-mdiiiated arid int^rated statistical mfinmation. 

An exanqile of a statistical database is Statfine, firam SN. Statfine is based on the client/server 
cmicQit, vhere file firont end (lumimg on a PC, possibfy outside SN) is separated fixim tile 
back-end (the database server, located at SN). Front end and back-end are presently connected 
throng traditional dataconununication &dfities like Local Area Networics (LAN’s) internally 
or sjnple asyndmmous fines (using telqihone fines and modems) external^. In order to 
rqptimize tile pCTformaiiceofitsmiilti-dnnensional database (see section 4), Statfine uses a 
proprietary, non-relational database design based tpon indexed files. The DOS-based front- 
end uses a user-fiiendly window/mouse desktop metaphor where the results of searches are 
dispi^ed in a type of multi-dimensional ^readsheet, wifli additional graphical views, 
inciud^ tiiem^c ct^. The Statfine fiont-end is the same as the sofiwaie we use as 
interface to our fipppy disk (or CD-ROM) -based pidificatians. Present^, Statfine does not 
use tile hiteinet, bm tiiis wffi (haiige vhen introduce the conc^ of dpRomrc IFebpoges. 



4. DYNAMIC WEB PAGES: COMBINING INTERNET WITH DATABASES 

As discussed above, the statistical infoimatioa found on ordinaiy W^ pages ontiie internet is 
difficult to manipnlam in a structured "wsy, in view of the documentary (non-numerical) 
character of a Web page. Also, each Web page is static in nature, Le. we have to prqiare each 
page befordiand by storing its (documentary) image on the Web server. Wouldn’t it be great 
to combine the power of on-fine databases, like our Statfine database, witiitiie ease of use and 
access of the World Wide Web? This is where the so-called dynamic Webpage enters tiie 




picture. The idea is to use browsers like Netsc^e as front end to syst^ns like Statline. Each 
time a iiser requests data, a fecial hitetfrce, called WITCH, translates the request to frie 
Statline frnmat and generates a Web page on-the-fry to peseot die result from Statline to die 
user. ■ 



An example of a WITCH generated Web page, using Netsc^ l.I with HTML3 table- 
support, is ^own below. 




using a Web-browser as front-end to a database with structured infotmadon, odier Web- 
tools also become available. For exanqile, besides presentadon in a Web format, we can also 
download infrnmadon or use other ‘Sdeweis" in die browser, e.g. to see spreadsheets, gr^hs, 
or maps from die net WITCH will not cmly generate (fynamic W^ pages but other formats 
like spreadsheets as welL In this w^, the user can save informadon in a structured frnmat in 
order to man^ulatethe data later. 

The advantages of this ^iproach are several: first, we don’t have to build our own front end 
tool, like we did with Stadine for DOS. Anyone with a decent Web-browser can access 
Statline, wherever in die world. Second, by using the commonly available Web browser, 
Statline becomes immediately availabie on difrerent platforms (Windows, Mac, UNIX). 
Third, the user does not has to learn a new interfrce, once die Web browser is known. Finally, 






we can use the Internet as communication medium, with all its advantages: hi^ bandwiddi 
(28.8 0^5 by modem or ev^rai better in case of ISDN or T1 links) and great accessibility (as 
said before, in the Ne&erlands die Internet will be a local phone call away for nearly everyone 
at the end of 1995). 

With miHi ons of potCTtial users being able to access our on-line statistical databases over die 
Internet, new challoiges arrive. The biggest concern, as we see it, is die statistical co* 
ordinadon of the information we |«ovide. 



5. STAllSnCAL CO-ORDINATION 

Most statistical burean’s provides hundreds of di&i^ stadsdcal publications fimn several 
hundreds ofsxtrveys. AH this amounts to million of figures, tbonsanAg of tabulations, and 
many, many difEerent sources of infotmafian. But except far some spedal publicafians (Hke 
doe Nadonal Accounts), each pubHcadon only deals with a very specific topjCt and users ate 
confionted widi an htaccesstble "goldmiiie of informadon" widi many, macay difiEerent &ces. 
Someone being intoested in, say, antmnoldles, has to look in more dm a dozen pubBcadons 
to get a total pictnre, encompassing die production of cars, the gqiorts and iaqiQrts, the use 
(in time and miieage), die energy consunqition, tiafdc accidents, the enviranmental effects, 
etc. Finding all dds hi&nnadon can be labodous en troublesome, espedal^ since eadi 
stadsdcal dqiaitm^ focuses onfy on dimr tonnes and poblicadons. At die same time, KSFs 
s^ onfy a very limited number of copies of eadi individual publicadon, odea without 
lecovaing the full dissemination costs, let alone the collecdon costs. And finally, vdiile users 
eppredste our inQiardali^ and accuracy, th^ conqilain about the lack of timeliness of our 
stadsdcal intonnadon. 

If aO available stadsdcal intormadon is placed on die internet, firee of charge, millions of 
users can and will access it Cmr^iare dds with die hundreds of users reading our printed 
publications. However, not only the inqilicadons in tezzns of distribution axe mind 
also die concqitQal implicatians will be pxst and probabfy very problemadc. Why? Since 
widi sodi a unlimited access to all stadsdcal infiiimadon, users will ask mudi better acc^ 
paths (widiseatdi by keyword and muid-dimensional (pieties, on dme, branch, r^on, etc.. 
<m top). And dien, afi» we have provided diese tools, diey will find out that our infenmadon 
is riot always coKgdinated, let alorieint^tated. Inconsistencies, buried m hundreds of 
d if ferent paper publications, will become visible on the net, and users wiE start asking 
qpesdcais: not onj^fm more, but also for better co-ordinated and better s tr u ct u red 
infiimiatian. 

One answer to this demand for better co-ordinated data is die systems ^iproadi, like Nadonal 
Accounts. Anodier, less ambidous goal, is to co-ordinate the ciassificadans, domains and 
definidons used in difEerent stadsdcal publicadons. Hus is the phiiosqihy bdiind a new 
database a^qnoach, based on the concept of muld-dimensiQnal tables or elides. 

As in ComputeL Assisted Interviewing (CAI) systems, we can distinguish between the data 
itself and its description, the so-called metadata. While the CAI systems focus on the 
individual data and metadata processed in die data coUecdon and editing stipes, in die 
dissemination database we focus on the aggregated data and its metadata. This metadata 




comprises both the syntax (format) and semantics of die data (die definitions of the published 
vazi^les, like the definidon of ‘dumber of enqiloyees”)» as a descz^on of die surv^ itself^ 
the sources and how the items are derived. The first s^ to co-ordinate statistical publications 
is to standardly the definitions of the variables used. 

Each it^ (e.g. number of employees) is often available for different domains, defined by 
crossings of diso^te, categorical variables, like sector, re^on or time. An other important 
mechanism to co-ordinate the dissemination of statistical data is the standardization of tiiese 
categorical variables, leading to classifications e.g. for branches of indnstnes, commodities, 
regions, etc. The basic r^esentation of information used in such a database is tiiei^ore the 
muhi-dhnensional matrix (sometimes called "cubicle'^ where one dimension reflects die 
different variables (e.g. number of enqiloyees, profit, prices^ a second one die (discrete) time 
axis (e.g. years and months), vhile other dimensions correqiand to various classifications 
(industries, commodities, regions, etc.). The items inside die matrix reflect die measurements 
("number of) on a certain variable ("en^lc^ees”) in die domain defined by die oossing of 
the cat^oiies on the other axis ("in indu^ X in x^on y at time f). Ofioi, cat^oiies are 
classified into different systems of detail (e^. a n-di^ industries clasrification, with s^lJ9) 
whidi are ofien (but not always!) hieraiducal to one anodier, resulting into levels of 
classification. 

Metadata (desci^ons) in diis database of cuMdes can refer to the total matrix, to die axes 
and didr variables and categories, and to the individna! itons inside die matrix. Pardcular 
problems of metadata arise when die definitions of certain categories Qike tenons, industrfes, 
commodities) change over domains, in particular ova'dme. As exanqile, take a regicm like a 
mtini^ialiQr. Nd only die number of inhalntants in Amsterdam in 1980 is different fium 
1990, but a^ the definition of Amsterdam itself differs between the two years (e.g. l^cause 
of border corrections)! Similar problems arise whoi certain items are only available for 
certain categories or classification levels, making con^iaiison of different items in various 
domains sometimes inqiossible. 

As eiqilamed above, in statistical databases the most inqiortaiit type is die multi-dimensi(mal 
olgect, orcubzde. A statistical database wifi contain many, mai^ different cidricies, vriiidi 
might all share similar classificaticms along some of dieh axis. B^des diese multi- 
dimensioual olgects, also suzqile ("fiat") two-dimenrional ooss-tabuiations, as shown in most 
traditional statistical publications, have to be stored and presented in the database, as well as 
(one dimensional) ted olgects like press releases. All dds information is docamoited 
(metadata) inside the database on various levels (from die total otgect down to die individual 
items or cells). A classification of database olgects into tiie well-known statistical dmnains 
(like economic, social and demogr^hic statistics), and dassifications thereof (production, 
environment, labor-market, well-being, etc.), make navigating dnou^ this immense database 
of infonnation more feasible. A veiy strong tool in finding the information needed is a 
database-wide keyword (tiiesaurus) system, vhich allows dte user to quick^ allocate the li^ 
olgect 

In several countries, statistical databases based on this concept of cubicles are pieseotiy being 
used or under construction. Well-known systems include PC-Axis fiom Statistics Sweden, the 
ABS-Database from me Australian Bureau of Statistics and the above mentioned Statiine 
database from SN. Statiine is bodi an internal SN system as well as an open an-line database 




our metadata. 

Outside SN, Stadine provides a direct cozmection our large accounts to the wealth of 
infonsation Statistics Nedieriands provide. By combming die WITCB iziter&ce, usizig the 
idea of (fynaznic Web p^es, we will allow for easy access to Statline over die Internet to 
maziy odier customs. 



6. DISSEMINATION: CONCLUSIONS 

Widi die spectacular rise in die use of the lotoiiet world-wide» eiectrosdc publishing quicMy 
becomes a reality. The internet and in pardcular die WWW (World Wide Web) not only 
provides great ease of use and ease of access to an huniense universe of infozmadon, it also 
provides great diallenges to sxatjsddans. Should we sis^ify put all our p^ier-based 
publicadons in eledxonic fcxcm on a Web server, using the same document form as we did in 
pnndng? Or should we make OUT statisdcal i n forniada n available in a more s truc t ur ed way? 
We think diat die tecfanolt^ of the sooUed ^namk Wd> pags as a dxmt-end to a database 
widi stadsdcal figures win be a better sohitian dian static Web pages, t^dbt in some w^ just 
replicates die paper meteor. 

'^Aote in general, once statistical infotmadon is available in a structured, machine-readable 
fonziat like Stadme, we can present it in any finm by just usizig intetfitces like WITCH. From 
such a datebase, not onfy Wd? pages can be generated on the fiy, but also fiK/email m^sages, 
press releases, databases on CD-ROM and even old-&shioned printed ouQiut in a con^iletely 
automated way. Of course, not only die data itself bat also the metadata should be machine 
readable, including the syntax (fimnat) and the semantics (cont^) of the data. Once this is 
adiieved, we can easily exdiange infinmatimi betweai stadsddans usizig standard ejqjort 
formats like Eurostat’s GESMES, like we now uise WordPerfect exports fiom a MS-Woid 
docnment All it takes is a stmctuied, machine readable and documented fonn of storage of 
all stadsdcal infinmadtm. 




developed BLA]S£ to do so. (Needless to say diat BLAJSE does znoie dian develop and 
present electnnuc questionnaires.) The gafns of diese developments was mainly in terms of an 
increase in producdvi^ or efBciency. The number of stafi* needed for coding, data entry and 
checking decreased dramatically. This efficiaicy also shows itself in the much &ster 
production of results. Still, diere is even more to In die first place on the efficiency of 
the production process itself But also in the statistical ^here inqnovements are still possible; 
new -ways of interviewing; CASl, computer aided self interviewing, an4 not direcdy a matter 
of n, more efficient sample designs. 

Much more however is still to be done in the field of collecting data among enteipiises. The 
d^nands here are stronger. Response burden has become an issue. It is die dihdng &ctor 
behind our strategic choices here. 'When we see at the same time that almost everywhere 
automation and IT has invaded the booldce^ing systems of the re^ondents involved, h is 
clear vhat our task fin die nearby fiiture will be: die Edification of the collection of 
information fimn enterprises by die NSL 'What CAl is for interviewing among households, 
EDI (electronic data interchange) will be for datEhcoUection among enterprise. Later in tins 
psqier we will go deeper into EDI with ox^prises. 

In die new situation, we axe talking mote dian 10 years tiom now, espedalfy die data> 
collection will be re^Kirdered. No longor die demand fin infonaatian but the sapply^ die 
available actual data-sets, will dictate organisation there; die sources. Each source will be 

t^iped once and conqiletefy fin any possible use widdn the NSL The collection is tedmically 
and concqitnaify adapted to that source, (hi the xemainhig sections of diis p^ier we will give 
some indication regarding the nature of diose sources.) 

Having collected the data we may have to translate them to statistically suitable conceits, 
integrate diem and we will have to distribute them among users. They m^ be intide the NSL 
the integrative systems like the National Accounts, or outside the NSL This means diat 
somevhere diose data will have to come togedier far distribution. For die hqnit>tide this can 
be illustrated as follows: 




Proces: old vs. new (2000+) 
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On ^ left we see ^ old atnaticm wMi a sq>axate prodoctioii line eadi individual statistic 

(jut. stove pqfes). Qndiezi^dte ftituie sitoatioa. There, all die pc^sible sources contzifaute to 
a central database of relevant inftmnation. From ftiat database die actual statistics are 
produced by combining die relevant informadon. ft is evidait diat in order to combine 
infonnadoQ one should be certain that die dtazacteristics of drat inftmnatian are such that 
combination makes sense. Those diaracteiistics are specified indie meta>infoiniatiQn. 



& ELECTRONIC DATA INTERCHANGE (EDI) 

¥iom now on we will focus on EDI with enterprises and insdtadons. A NSl collects data to 
produce statistical output Wbat needs to be done is making a transiatian fimn die dam of die 
respondent to the data of the output This is done in several stqis. The first step may be left to 
the respondeat if so, ft leads to a certain response burden. 

The first stq> of the translation involves two parts. First there is the concqitual translation, 
the mapping of the conceits of die source, the administrative conceits, on the concepts to be 
delivered to the NSL This is die most difftcnft part Not only do busmess records difier fiom 
statisdcal infiirznation but also do th^ difier among dtemselves. The second part of die 
tianslatiQQ is a tedmical one. We would like to receive data in a suitable technical ftmn. 
Eqpedally we and our respondents would like to avoid data-entry. 

Electronic data interchange wQl be one of the stzat^c tools to meet the challenge of lowering 
the response bcnden and in^noving our producthoQ^. Ih every individnal case we should 
decide whether to use ft and in what mode. We will describe sevoal modes of EDI and judge 
them by their effect tpon the xe^KinseburdeiL Of each posobilrty we will indicate the nature 
of die translation and egiedalfy viio is going to mai% ft. We concentrate on the conc^jtual 
t ranslari on. 




a. EDI cm centrally l&^t legists 



Here we do not approach die individual lespcmdent at all. We are dealing with centrally kept 
mf ormario n on individual units, collected for other puiposes than statistics and yet of interest 
to die statistician, in itself this way of collection oreates no re^onse burdmi. 

in the Netherlands there are several examples of usable registers. There are centrally kqit 
registers of enterprises with the chambers of commerce. Ihe t^ of these roisters feed our 
own register of statistical units. Statistical ditfa can also be had dom fiscal (conqiany tax, 
VAT) or social security sources. For several posribilities (chambers of commerce, conqiaiQr 
tex and VAT) die possibilities are used or being researched. 

b. C omm er ci al boold&eeping bureau’s 

A related possibility is tapping fixnn the information of commercial booldoeqiing bureau’s. 
They keep the records on financial information or r^arding the wages of sometimes a large 
number of individual enterprises. This possSnlity also is attractive because of the large 
number of respondents involved widi onfy one Imk. Fur t h e rm ore diese service bureau’s will 
be c^iable of providing us with more infonnation than e.g. die fiscal records contain. A 
disadvanta^ is diat these service bureau probabfy will diatge dieir cliCTts for answering the 
questions (d'lheNSI. Not every dient will bepn^aiedto pay. 

c. EDI on individual reqiondents 

When the above descrfoed possibilities ate not available we will have to appr oach die 
individual respondoiL hi doing so we tiionld be aware of the foot that sometimes we will 
have to discern whhin one statistical unit, often an enterprise, several sets of administrative 
records. We will see that we will have to sppxmxii diese subsets separately and in a different 
manner. TK^dun commetdal enterprises we find die financ ial records, the logistical 
infimi^on (foreign trade, stochs) and the records <m wages and enqilqymenL E^iedaliy the 
financial records and those on wages are stricdy separated in the Dutch situation. 

Here we classify by tiie translator of the information. 

C.1 The translates 

One of onr EDl-projects - EFLO > works along this line, h deals widi die data finm the Dutdi 
municqialities. Ih^ deliver a set of records direct tipped fiom dieir own comply set of 
records. The translation is done at Statistics Netherlands. The advantages in terms of 
re^ondents’ burden are evident Although extra wo± by die NSI is needed, this extra work 
can be seen as an investment dqieading on the stabi^ of the translatian sdieme. h is 
eiqiected diat this form of EDI will lead to an improvement of prodnctiviQr <mce the 
translatian sdiemes are conpleted. I m p ort ant is that we ate here dealing with a limited 
numbo* (600) of leqxmdetiis. 



cJ2 The respondent translates to a standard record 




H^e a standard record of information is defined. The standardisation r^ards both the 
conceptual and the technical aspects. To produce die record, writing die software, is left to the 
le^ondenL Woddng with a standard record is not always possible. It can only be done when 
the informadon is alreacfy standardized among re^ondCTts to a certain degree. Ftnihennore, 
to make a standard record possible die NSI sometimes may have to move towards the 
concepts of die respondent In that case a larger part of the total translation to the final 
stadsdcal output has to be dcme by the NSL Especially wh^ the standard record is available 
in the bookkeqiing software the respondent uses and regularly tydates, this mode of EDI has 
a clearly &vourable eSect on the respondents’ burden. 

There are two exan^les. One is CBS-IRK, die EDI on intra-EC trade. The standard record 
developed here is inqtlemented in over 40 software systems available on die Dutdi madcet, 
aft^ certification by Stadsdcs Nedieilands. The EGUSES project is the other exan^le. It 
r^aids wage informadon. That subset of conqiany records is highly regulated in the 
Netherlands. That fact made it possible to define a standard record. 

c3 The leqiondept translates, no standard record 

Sdll a very large part of the information we axe looking for is 1^ out The respondent has it 
in a ftnm diat concqitualty and tedmically dUfi^ ftom what the NSI wants and ftmn vhat 
other leqiondents have. Tlte last possiinihy is diat the NSI ptcvides the software by which the 
re^xmdent can set tp a translaticm s^eme fisr bodi the tedmical and die conceptual 
transladon. Once set tp, and in so &r as no dianges occur, die scheme can be used to produce 
data to be delivoed to die NSL The exanple here is EDI-Piiot 2 directed at the financial 
records and described in the next section. 



9.EDI-PTLOT2 

We will now describe the project EDI-Pilot 2 directed at the financial records of individual 
enterprise as an exanple. ft diows die problems one has to &ce. 'While describing Pilot 2 we 
can le&r to die scheme in the previous secdon. 

Pilot 2 is dhected towards individual financial accounts, hi the Dutch situation these axe only 
a part of die accounts of an enterprise. Espedally die accounts on wages and raploymait are 
excluded. Hus is not a choice voluntarifymade by Statistics Nedierlands but one forced upon 
us 1^ the vray the booldteeping systems are organised in our country. Leaving out detailed 
quesdoos on wages, we combine within Pilot 2 all die qoesdons that are put to the financial 
accounts. The result is die combined quesdonnaire. 

The contents of die combined quesdonnaire are dictated by what is available in the financial 
accounts. Ri^uiated as our sodety Toay be, die financial accounts m^ diveige strcmg^ in 
internal cnganisadon and in the conceits used, hi the first place this means that we will have 
to adspt our quesdons towards the possibilides of die automated system of die enterprises. 
This may inply more stadsdcal work ftir the NSI to reach the same oupuL If one wants more, 
it will probably be necessary to ask for addrdonal information to be givoi eaplicidy by the 
respondent, that means by data-entry. hi the second place the diversity of respondents means 
that a unique translation scheme will have to be set tp and maintained for each respondeat 




F inancial acen mrts aiso diffe r in tiieir teclmical lay-oot. A large number of bookke^ing 
software systems is in use. Tliere is no standard record for information to be selected 
electronically ftom ftie software and it is not e:q>ected that it will be possible to define one 
within the near ftiture. As the Tnain goal of Pilot 2 was file lessening of die le^ondents^ 
burden, it was decided that tiie amount of data>entry was to be minimised. 

That means drat some mgenui^ was needed to create the atitomated link we were looking ft>r. 
This is done by using the reports or print-outs of die software system. Instead of printing 
di^ diey are sent to a file, a print-ffle, to be read the translator, the main part of the 
software module diat will run on the re^ondents compnter diat is now bdng developed as 
part of Pilot 2. The layout of the rqiorts and dius of the printfiles is &xtfy stable. The 
respondent communicates this 1^-out to the translator. He defines rows and columns within 
the r^Knt Subsequently he tells the translator how to manipulate die rows and columns in 
order to transform the inftnmation in the r^ort to d» statistical information asked for by the 
combined questionnaire. The resulting records are sent over to Statistic Netherlands. 



The Translator 




We see dien the two parts of the translation scheme. The first part lays down the 1^-out of 
the printfiles to make the technical transformation. The second part defines tiie conceptual 
tranrfonnation of tl^ information to be found cm die prin t fi le towards the statistical 
informatian asked for on the combined questionnaire. 

The final questicm is \^o will make tiiat translation schCTie. One of die piinc^les of Pilot 2 
is diat ^e respondent translates”. This means that the respondent himself has to set rp the 
translation scheme. This of course make it less respandent fiiendly. It seemed however 
inqiossible to set up diose translation sdiem^ at Statistics Netherlands. It is clear that diis is 





not an ea^ task for 'die re^ondenL On die one hand this means that a strong heip-desk and a 
&irly large field service is needed, and on die odier hand this means dial even with Pilot 2 we 
will not yet reach die ultimate user-fiiendliness of EDI 

We eiqiect the translation scheme to be &irfy stable or, in other words, diat technical and 
ctmceptual changes will not be too fiequent A second time die translator can use die already 
available translation scheme to produce the stadsdcal information. Answering the cmnbined 
quesdonnaire then becomes a matter of minutes instead of hoots and can be handled 1^ a less 
qualified CTiqiloyee. That is wbat makes the cmicept attractive and die initial investment 
worthwhile to die re^cmdent 



10. SCOPE OF PILOT 2 

As said. Pilot 2 is dkected towards the financ ial accounts. The prmdple is that all die 
in&zmadon that is tapped firan the finandal accounts by any statistic of Statistics 
Netherlands will go dnou^ Piiot 2 if automated letneval of diat ii&miation is possible, hi 
pracdce this means that several large statistics will switch conqiletety to £DL For indistry, 
our main tar^ we find: 

• Mondilystatisdcs on total tnmover 

• Mondifystatisdcs on fisogntxade, by product 

• Qnarterlystatisdcs on tmnover by product 

• Yearly statistics on gross investmoit 

• Yeaxfy statistics on the prodocdon process 

• Yearfy statistics on die finandal processes, inc. balance sheets 

Hie pa rd c q i ation of for»gn trade is a |»iot within die pilot Not only does Statistics 
Nethedands already have a soccessfoi EDI on this area in IRIS, but also die possdnfities of 
getting enough hineiSti trade data -nhenaininig in the first place at the finandal accounts, still 
have to be researdied. 

Some questions in the above mentioned statistics are dro|ped, e.g. the questions on quantities 
of energy used in the production statistics. Tb^ cannot be addressed by tins form of EDL 
Pcobabfy a separate p^ier questionnaire on dus suigect will be sent 

On the odier band, some questions originating finm o&er statistics mainfy^ aimed at odier 
snfcgects and accounts (e.g. the labour and wage accounts) ate included beomse the answers 
ate typical^ to be found within the financial accounts of die enterprise. 

The domain of EDI consists of those commerdal enterinises that have set up financial 
accounts by means of cmrqiuter sofiwaie diat satisfies certain tedmical specifications. In 
practice diis means diat we dhect ourselves towards die pro fit sector widun industry, trade 
and services. We start widi industry because diete the gains in terms of lessening die 
respondents’ burden will be the largest Indxviduai smaller enterprises are not included 
be«tnse di^ bookkeeping and automation capacities are oqiected to be too low. in view of 
die relative small amount of infonnatian asked here, more is eiqiected form central^ 
records (VAT, corporate tax) and fixnn booldfieqnng bureau’s often kec^iing books for 
hundreds of smaller enterprises. The very lar^ enterprises are also excluded. Because of 




tiiere cosqilexity they need an individual appr oach of course in the end also means of EDI 

but then ‘‘tailor made”. 

R^arding the number of respondent p articip ating in tins kind of EDI, we ^ould mention diat 
in pilot 1 a number of 12 respcmdents pardc^ated and still do. Pilot 2 will start with a field 
test next march aimed at 20 respondents. Starting Sqjtember 1996 we aim at larger numbeis. 
By end 1996 Pilot 2 ^ould handle sev^al hundreds of respondents. Pilot 2 will also be used 
to approach the booldce^ing bureau’s. That will lead to larger numbers of statistical units 
desoibed with one EDI-link. if £DI-Pilot 2 is successful we will, following pilot 2, in 1997 
fltm at a number of 25,000 units to be ^iproached widi this instrument, pax^ finou^ the 
booldceq>ii^ bureau’s. 

The revenue of Pilot 2, if snccessfiil, will in the first place be a relief of the xeq>and»it5’ 
burden. PxoductiviQr gains wOl not be that large, hi die first place all femds of acdvifies 
rranain. Not every leqioodent will paitidpate, data will sdll have to be checked etc. hi d^ 
second place new activities arise in the finm of a growing he^>*desk and a field-sovice that 
will not only have to cope with bookkeeping problems but also with tedndcal antnmatifip 
problems. 

A amSar project as the Dutdi EDI pilot-2 is the sooUed IEL£R prefect hi lELER various 
European N^s woik togedm (under the Dutdi stpavision) to test die EDI concept in 
stad^cal datacoUecdoii. The TELER prefect, T^di runs 1996 to 1998, is partly 
financed by die EEC. 



11. COimiOmNGFlLOT2: THEMETA-SYSTEAl 

Eventually Statistics Nediedands aims to leadi several thousands of reqxmdents udng EDL 
This of course asks fin a control ^^stem to deal widi die productiau the ^pi o pr iate 
electronic quesdmmaire, sending it to die lespondait, checking the response, chednng and 
storing, d» incoming data and controlling posable fied bade etc. Dus means diat a lot of 
irifimriation, rnetarinfbmiation, on die xeqxnidents has to be kqjt vpdated. 

Anodierpart of die meta^infinmafion deals with the contents of die combined quesdonnaire. 
As an exan^jle we will fiicus on that part Constructing the combined questionnaire we need 
to co-ordinate die qjproach of the dHfa oit statistics aimed at the finandal records among 
each otiier but also with the booldceeping practices of the respondents. Of course the latter 
alreatfy h^jpened before but with EDI it will become more explicit This needed some 
n^idation. h is clear that widi EDI iqi and running, modi of die fisrmer antonon^ of the 
individual smdstics, egiedalfy r^aidinglhdr questiomiaiie, dis^jpeaxs. 

The module containing the translator 9 ves us better r^portunities fiir sippl^mig meta- 
infinmatian to the leqxindent fiian before. There are die usual an-line help-fimctions. By 
means of hyper^ct die explanations are linked. For die heh>-desk and for the field service 
probably a more detdled system of he^fimetions and eimlanations will be set ip. The 
system not only contains cross-linkages but also sinple conputational rules so that for 
instance totals can be computed. 




For this end a set of variables was laid down in a database, with names, questions t3sO&, 
ejqjlahadons and, if necessary, computational relations with other variables. From this data~ 
base, variables, question>texts, explanations etc. are selected and combined to questionnaizes. 
Respondoxts are classified into clusters by size, branch of activity and type of frnancial 
records kept. Sometimes sale-records axe Ippt by the enter pr ise itself but yearly balance 
sheets are set up 1^ a booldceepmg bxneau. For diat statistical unit the total of ^ information 
needed will have to be collected two different questioimaires directed towards two 
different r^orting units. Each cluster gets its own combined questionnaire. 



U, EDI: CONCLUSIONS 

hi this w^ a large set of nteta^in&rmation on concepts emerges. This meta-infoimation 
contiols die process of data-collection. A question aimed at the financial records can only get 
there dnou^ the coxtial database of variables. Wlten entering die variable, the relation widi 
the rest of die ccmtents win have to be made clear, h has to fit in. 

hi die first place we now see that the diaracter of meta-information has changed, hi most of 
the literature we ofien find rneta-infbrmation as a mere descrqrtive piece of infixnnatian only 
available if die statistician has finmd die time to s^ it 19 , mos% after he has produced his 
statistic, fin die benefit of die user. If later ondie stafisddan diverges fiom his eafiier meta- 
infimnation dime is nodiing to stop him and nothing that guarantees that the meta- 
infiarmation will be adapted. 

Here we find a piece of meta-infomiation that has to be set rp befine die production process 
starts. The stadstidan cannot but use the meta-infiarmation system. The meta-infiarznatian has 
become a tool in die poduction process. From being descrptive it nas come to be 
presmipdve. Eariier we saw die same dung happening widi data dissemination and data- 
coUecdon among households throng BLAISE. 

This however has further reaching consequences. We can now go back to the first secdons of 
diis p^ier. Thme we gx>ke of the estza demands put to NSTs. One of them was less 
respondents’ burden. That was die first goal of EDI-Filot 2. But we also see hme bow the 
technology push gives us some opportunities to answer another demand namely diat fiar more 
cohermice. h goes wxdiout doubt diat die way EDI is hxplemented here will lead to a larger 
extmit of statistical (conceptual) co-oidinadon. We mentioned die power of die meta-system 
and we also see that withm EDI a number of statistics is cmnlnned that were eariier produced 
in sqxarate, independent processes. Remarkable is the &ct that this gr ow th in statistical co- 
ordination is not reached by an increase in central directives hot as a side*pioduct of the tools 
used in die production process. We do not dnxikdiat all the problems of fhe coheimice of our 
end-product, that means all die problems of statistical co-ordination, can be solved by 
devising die proper tooL We do di^ however that figther imp i' ovem ents can be made in this 
field hy applymg the possilnlities of die technology push in the ri^ way. 




